Statistics of Television Signals 
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(Manuscript received February 28, 1952) 

Measurements have been made of some basic statistical quantities characteri- 
zing picture signals. These include various amplitude distributions, auto- 
correlation, and correlation among successive frames. The methods of meas- 
urement are described, and the results are used to estimate the amount by 
which the channel capacity required for television transmission may be re- 
duced through exploitation of the statistics measured. 

INTRODUCTION 

One of the teachings of information theory is that most communica- 
tion signals convey information at a rate well below the capacity of the 
channels provided for them. The excess capacity is required to accom- 
modate the redundancy, or repeated information, which the signals 
contain in addition to the actual information. Removal of some of this 
redundancy would reduce the channel capacity required for transmission, 
thus opening the way for possible bandwidth reduction. In order to 
remove redundancy, one must first understand it; the amount and nature 
of the redundancy can be completely defined in terms of various statis- 
tical parameters characterizing the signal. 

It has been pointed out that the existence of redundancy is particularly 
evident in the case of television ; moreover, its elimination is highly de- 
sirable because of the large bandwith presently required for transmis- 
sion. Evidence of redundancy is found in the subject matter of televi- 
sion — the average scene or picture. Knowing part of a picture, one can 
generally draw certain inferences about the remainder; or, knowing a 
sequence of frames, one can, on the average, make a good guess or pre- 
diction about the next frame. In either case, knowledge of the past re- 
moves uncertainty as to the future, leaving less actual information to be 
transmitted. 

Another way of looking at this is to visualize the picture as an array 
of approximately 210,000 dots, 500 vertically, 420 horizontally, cor- 
responding, respectively, to the 500 scanning lines and 420 resolvable 
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picture elements per line of the standard television raster. Each dot can 
have, say, 100 distinguishable brightness values in a good-quality pic- 
ture. The number of possible combinations is therefore approximately 
100 2io.ooo Qr ]0 42o,ooo At the ugual rate of 30 frames per seC ond it would 

take approximately lO 419 " 1 years to transmit all these "pictures," which 
our present television system is fully prepared to transmit! The vast 
majority of these "pictures" will, of course, never be transmitted in this 
age because the average picture statistics virtually preclude the pos- 
siblity of their occurrence. 

If all of the redundancy alluded to in the preceding paragraph were 
to be expressed in terms of statistics, the array of data would be stagger- 
ing.* Redundancy encompassing even a small part of a single frame 
implies statistics of enormously high order because of the large number 
of possible past histories. The initial attention should therefore be 
focused on local redundancy, encompassing only a few adjoining pic- 
ture elements. Accordingly, measurements have been made of the fol- 
lowing statistical quantities. 

1. Simple probability distribution of signal amplitudes corresponding 
to picture brightness. This encompasses only a single picture element, 
revealing the relative probabilities of this or any element's assuming 
the various possible brightness values, in the absence of any past-his- 
tory information. 

2. Simple probability distribution of error amplitudes resulting from 
linear prediction of television signals. Only the simplest type of linear 
prediction is considered here, so-called previous-value prediction, which 
predicts each picture element to have the same brightness value as the 
preceding one. The prediction error signal is simply the difference be- 
tween the picture signal and a replica delayed by one Nyquist interval 
(one-half the reciprocal bandwidth or the time interval corresponding 
to the spacing between picture elements). The distribution of this error 
signal encompasses two picture elements (past history of one element) 
and therefore is a condensed version of the family of first-order joint 
probability distributions. 

3. Autocorrelation of typical pictures. This statistical quantity is an 
even more streamlined version of various families of different-order 
joint probability distributions. Each family corresponds to just a single 
point on the autocorrelation curve ; the ordinates of the curve represent 
the average correlation between picture elements spaced by various 

* Complete statistics extending, say, over one frame period, would comprise 
one conditional probability distribution per picture element for each possible 
past history. With the approximate figures cited above, the number of distribution 
curves (many of which would be similar) is 210,000 X 10«»."» or 10«o.om.». 
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distances. This correlation, say, between horizontally adjoining elements 
is simply the average product of the two brightness values of each pair 
of neighbors, relative to the average square of all brightness values. 

The three quantities enumerated above contain a great deal of sta- 
tistics in very compact form, but these statistics are essentially of a local 
and linear nature. They do not include the bulk of the Large-scale re- 
dundancy, which is of a far-flung and nonlinear nature. 

AUTOCORRELATION 

For a function of time, f(t), the autocorrelation can be expressed as 
0(t) = f(t) fit + r) (1) 




Fig. 1 — Picture autocorrelator. 
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averaged over all time, for various values of the time shift t. In the case 
of a picture transparency, the optical transmission is a function of two- 
dimensional space, expressible in polar coordinates as T(s/#), and the 
autocorrelation can be expressed in analogous fashion. The time variable 
t is replaced by the space coordinate s/0, and the correlation time shift t 
is replaced by a space shift As/0 , so that the new expression is 



<f>(As/_l) = T(s/l) T(s/l+ As/0), 



(2) 



averaged over as much area as practicable. This space-domain auto- 
correlation is much easier to measure than the time-domain autocorrela- 
tion. We need merely measure the relative optical transmission of two 
identical cascaded transparencies, shifted from register by a variable 
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Fig. 2 — Basic arrangement of picture autocorrelator. 
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Fig. 3 — Close-up view of slide holding assembly and shifting mechanism of 
picture autocorrelator. 

amount. The averaging process is inherent in such a measurement. 

The apparatus used to measure autocorrelation is shown in Figs. 1 
and 2. The chamber at the bottom contains a light source of very 
constant intensity and a convex lens to collimate the light. The middle 
part, made of accurately machined aluminum, holds the two identical 
slides of the picture under test, and an aperture exposing a large circular 
area of the slides. The top chamber contains a collector lens and a photo- 
multiplier tube which (on a microammeter not shown) gives a sensitive 
indication of the total light transmitted through the slides. Fig. 3 
shows a close-up view of the slide-holding assembly. Two close-fitting 
graduated aluminum rings permit accurately determined rotation of 
both slides or one slide, and the micrometer drive permits translational 
displacements measurable to within one mil (moving the two slides by 
equal and opposite amounts); the separation between picture elements 
is approximately 7.5 mils horizontally and 5 mils vertically (for the 
W by S\" slide size used). 

The light transmission is always a maximum when the two slides are 
in precise register (As = 0). For large shifts the transmission fluctuates 
about a nonzero asymptote. The nonzero asymptote results from the 
fact that the average transmission is always positive, and the fluctuation 
from the fact that large displacements introduce substantial amounts of 
new picture material into the aperture. Since these components tend to 
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obscure the correlation effects, it is useful to make additional measure- 
ments which enable us to subtract them out completely. This leaves us 
with a 'pure' autocorrelation A(As/0), which is then normalized so as 
to have a peak value of unity. It is given by 



A(As/6) = 



T 2 ± 



As 



ll) - Tx (^ /±) T 



M-f/i) 



(3) 



tm - nm 

where T 2 ( /^ I is the transmission through the two cascaded slides 

As 
shifted by equal and opposite amounts — at an angle 6 with the hori- 

Li 

zontal, and 7\ ( — [§_ ) is the transmission of a single slide with displace- 
As 
ment — at the same angle 8. 




SCENE C SCENE D 

Fig. 4 — Test pictures whose statistics are included in this article. 
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Fig. 5 — Plots of autocorrelation in horizontal and vertical directions for two 
pictures. 



Fig. 4 shows some pictures for which autocorrelation measurements 
have been made. The results can be presented in the various ways 
shown in Figs. 5, 0, and 7. Fig. 5 shows conventional plots of A versus 
As in the horizontal and vertical directions. Scene B is seen to have more 
correlation than Scene D, and curve shapes range from remarkably 
linear to somewhat like exponential. Fig. 0, giving contours of constant 
autocorrelation, brings out the variation with the angle 0. Scene A 
happens to have its greatest correlation in the vertical direction, but 
that was not found to be a general rule by any means; Scene B, for ex- 
ample, has its greatest correlation in the horizontal direction. No pre- 
ferred directions appear to exist in general. In Fig. 7 attention is focused 
on the more local correlation, for small values of As. The average cor- 
relation among horizontally adjoining picture elements, designated by 
A io , is seen to be approximately 0.99 for Scene B and only 0.75 for 
Scene C. A 2 o denotes the correlation for a horizontal spacing of two 
picture elements while A i denotes the correlation among vertically 
adjoining picture elements. 

It should be pointed out that the pictures which gave the above 
results were not band-limited to the standard 4-mc resolution. However, 
before the results were used quantitatively, the proper band limitation 
was applied mathematically. This has the effect of rounding off the peaks 
of the curves, decreasing the autocorrelation drop within the first Ny- 
quist interval by up to approximately 24 per cent. 
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Fig. 6 — Contours of constant autocorrelation for Scene A. In general there are 
no preferred directions of correlation. 
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Fig. 7— Plots of autocorrelation for small shifts. Aio is the autocorrelation for 
a shift of one horizontal elemental distance, A 20 for two horizontal elemental dis- 
tances, and Aoi for one vertical elemental distance. Alternatively Aio may be 
described as the average correlation between horizontally adjoining elements, 
etc. 



STATISTICS OF TELEVISION SIGNALS 



759 



PROBABILITY DISTRIBUTIONS * 

A probability distribution of amplitudes is generally shown as a plot 
of probability density versus signal amplitude. Probability density, say, 
corresponding to amplitude Xi , is the probability of finding the signal 
amplitude between X\ and Xi + dx, divided by the differential amplitude 
increment dx. Conversely, the probability of finding the signal ampli- 
tude between x x and x x + dx is given by p(xi)dx, p(x) being the proba- 
bility density corresponding to amplitude x. 

If a cathode-ray spot is deflected, say horizontally, by the signal in 
question, its average dwell time at any point is directly proportional to 
the corresponding probability density. In the optical system shown in 
Fig. 8, a cylindrical lens maps each point into-a vertical line which is 
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Fig. 8 — Basic arrangement of probabiloscope. 

then tapered in intensity by an optical density wedge before reaching a 
high-contrast photographic film. Depending on the dwell time at any 
amplitude level, the corresponding tapered line has enough average 
intensity to blacken the film up to a certain level. This level is pro- 
portional to log p(x), since the density wedge is tapered exponentially 
so that the intensity of each tapered line of light reaching the film 
diminishes, say, by a factor of ten for each inch we travel up the fine. 
The film in effect traces out a contour of constant exposure. 

Two or three iterated photographic printings increase the effective 
gamma sufficiently to yield a contour of ample sharpness. This contour 
is then changed to a sharp line by a simple dark room trick: while the 
film is in the development tray, already fully developed, it is momentarily 
exposed to light. The blackened portion of the film is unaffected, the 
clear portion is fully blackened, while the transition contour, being 
partly opaque, is not fully blackened. By printing from this film we then 
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obtain a well-defined bteck-on-white curve of p(x) versus a: on a loga- 
rithmic probability scale. The logarithmic scale has the advantage of 
making the curve shape independent of exposure length and giving uni- 
form relative accuracy over the entire range. 

Fig. 9 shows some typical results obtained by means of the "proba- 
biloscope." The two small curves are distributions of two different still 
pictures. The left-hand end corresponds to black, the right-hand end to 
peak white; the blanking intervals (slightly blacker than black) cause 
the peaks at the extreme left. (The signals did not contain any synchro- 
nizing pulses.) The tall and slender curve at the right of Fig. 9 is the 
distribution of errors resulting from previous-value prediction of one 
of the pictures in Fig. 4. The peak corresponds to zero error which is 
seen to be most probable, as it should be if the prediction criterion is 
good. Increasingly larger errors are increasingly improbable or rare. 
The six decades of probability density spanned by the curve were ob- 
tained in three separate exposures and subsequently joined, since stray 
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Fig. 9 — Typical probability distributions as obtained from the probabiloscope. 
Curves at left are for video signals; right-hand curve is for difference between 
video signal and delayed replica. 
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light limits the useful range of the probabiloscope to approximately two 
decades. In obtaining those sections of the curve corresponding to the 
few and far-between large errors, a long exposure was used and the 
cathode-ray beam was blanked whenever passing through the range of 
zero or small errors. The vertical scale on all curves is determined solely 
by the density taper of the optical density wedge. If this scale is to repre- 
sent true probability density, instead of a proportional quantity, it 
should be shifted up or down so as to make the area under the curve 
equal to unity. 

APPLICATION OF RESULTS 

The statistics measured can be put to various uses, such as in the 
design of better predicting or coding schemes. The most interesting 
application is probably in estimating the reduction in channel capacity 
which the measured statistics show to be theoretically possible. In 
other words, the results can give us various lower boimds to the re- 
dundancy of television signals. 

For the sake of illustration, suppose that the signal is quantized into 
64 amplitude levels. An ordinary television channel assumes all 64 
levels to be equally likely, hence is prepared to accommodate log 2 64 
or 6 bits per sample. But the simple amplitude distribution of the 
signal is not flat, so that all 64 levels are not equally likely. The maximum 
possible associated average information content per sample is given by 

64 

#max = £ Pi log Pi i ( 4 ) 

t 

where p,- is the simple probability of the signal's falling into the ith. 
level. Since the 64 p,'s are unequal, # max is necessarily less than 6 bits. 
For all available data the average value of # max turns out to be ap- 
proximately 5 bits, indicating a one-bit redundancy. The latter figure 
is essentially independent of quantization. 

The prediction error signal still contains all the useful picture in- 
formation. The maximum possible information content per sample (max- 
imum in that all samples are assumed to be completely independent) 
is still given by (4) but in this case the 64 values of p t are obtained 
from the peaked error distribution. The average* result from all available 
data turns out to be approximately 3.4 bits below the 6-bit ceiling, show- 

* This average was computed by averaging the various redundancy values ob- 
tained for the individual pictures, rather than averaging all statistical data and 
then finding one corresponding average redundancy. The average computed here 
is more favorable and can be realized only if optimum coding is performed on a 
short-term basis rather than on the basis of one set of long-term statistics. 
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ing that the original signal must have contained at least 3.4 bits of 
redundancy. 

The autocorrelation can also furnish a lower bound to the redundancy, 
as has been pointed out by P. Elias in his Letter to the Editor of the 
Proceedings of the I.R.E. for July, 1951. If, for example, the correlation 
Aio , between horizontally adjoining picture elements, is high, the cor- 
responding lower-bound redundancy is very roughly equal to 

R ~ — ^ log2 (1 — A w ) bits/sample. (5) 

Alternatively, taking the Fourier transform of the autocorrelation yields 
the power spectrum P(f), from which we can find the lower-bound re- 
dundancy through the relation 

R = -L f log 2 P(f) df + i log 2 W + log 2 K bits/sample, (6) 
2W Jo 

1 f w 

where W = bandwidth in cps, and — = / P(f) df. 

K. Jo 

Using either method, one obtains approximately 2.4 bits for the 
average* of the available data. This is an approximate bound, in that 
it applies strictly only to functions having gaussian amplitude distri- 
butions. 

Suppose, then, that we have exposed an average redundancy of at 
least 3 bits per sample. This means a potential 3-bit reduction in the 
chamiel capacity required for television transmission. In a 6-bit system 
(64 amplitude levels) this means a 50 per cent reduction, and hence a 
potential halving of the bandwidth with the aid of an ideal coding scheme. 
It is true that the decorrelated signal is somewhat "frail," i.e., vulner- 
able to interference, so that it might be desirable to use a "rugged" 
system of the PCM variety for transmission. Thus, if a Shannon-Fano 
code were used, the 3-bit decorrelation should enable us to send tele- 
vision by an average of 3 on-off pulses per picture sample rather than 
6. This represents a two-to-one saving over the usual PCM bandwidth. 
More spectacular reductions are likely to be achievable only by tap- 
ping the large-scale redundancies mentioned earlier. 

FRAME-TO-FRAME CORRELATION 

There is, of course, a great deal of interest in the possibility of utilizing 
the similarity between successive frames. Accordingly, adjacent-frame 

* See previous footnote. 
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correlation was measured for two typical motion-picture films, by means 
of the apparatus described in the section on autocorrelation.* The results 
were 0.80 and 0.80, after correction for the 4-mc bandwidth limitation. 
This means that "previous-frame" prediction can remove only slightly 
more than one bit of redundancy per sample. More complicated schemes 
would presumably be more successful in taking advantage of the large 
frame-to-frame redundancy which undoubtedly exists. 
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* The expression used in evaluating the correlation between frame 1 and frame 
2 (any two frames) is 

Cl2 Tn ~ Tf {7) 

where T u is the optical transmission of frames 1 and 2 in cascade, r l\ is the average 
of the individual transmission of frames 1 and 2, and Tu is the average of the 
transmissions of two cascaded slides of frame 1 and two cascaded slides of frame 
2, respectively. In all cascade transmission measurements, the two frames must 
be in precise register. 



